Goto

Collaborating Authors

 frankle and carbin


A Recovery Guarantee for Sparse Neural Networks

arXiv.org Machine Learning

We prove the first guarantees of sparse recovery for ReLU neural networks, where the sparse network weights constitute the signal to be recovered. Specifically, we study structural properties of the sparse network weights for two-layer, scalar-output networks under which a simple iterative hard thresholding algorithm recovers these weights exactly, using memory that grows linearly in the number of nonzero weights. We validate this theoretical result with simple experiments on recovery of sparse planted MLPs, MNIST classification, and implicit neural representations. Experimentally, we find performance that is competitive with, and often exceeds, a high-performing but memory-inefficient baseline based on iterative magnitude pruning.



BINGO: A Novel Pruning Mechanism to Reduce the Size of Neural Networks

arXiv.org Artificial Intelligence

Over the past decade, the use of machine learning has increased exponentially. Models are far more complex than ever before, growing to gargantuan sizes and housing millions of weights. Unfortunately, the fact that large models have become the state of the art means that it often costs millions of dollars to train and operate them. These expenses not only hurt companies but also bar non-wealthy individuals from contributing to new developments and force consumers to pay greater prices for AI. Current methods used to prune models, such as iterative magnitude pruning, have shown great accuracy but require an iterative training sequence that is incredibly computationally and environmentally taxing. To solve this problem, BINGO is introduced. BINGO, during the training pass, studies specific subsets of a neural network one at a time to gauge how significant of a role each weight plays in contributing to a network's accuracy. By the time training is done, BINGO generates a significance score for each weight, allowing for insignificant weights to be pruned in one shot. BINGO provides an accuracy-preserving pruning technique that is less computationally intensive than current methods, allowing for a world where AI growth does not have to mean model growth, as well.


To update or not to update? Neurons at equilibrium in deep models

arXiv.org Artificial Intelligence

Recent advances in deep learning optimization showed that, with some a-posteriori information on fully-trained models, it is possible to match the same performance by simply training a subset of their parameters. Such a discovery has a broad impact from theory to applications, driving the research towards methods to identify the minimum subset of parameters to train without look-ahead information exploitation. However, the methods proposed do not match the state-of-the-art performance, and rely on unstructured sparsely connected models. In this work we shift our focus from the single parameters to the behavior of the whole neuron, exploiting the concept of neuronal equilibrium (NEq). When a neuron is in a configuration at equilibrium (meaning that it has learned a specific input-output relationship), we can halt its update; on the contrary, when a neuron is at non-equilibrium, we let its state evolve towards an equilibrium state, updating its parameters. The proposed approach has been tested on different state-of-the-art learning strategies and tasks, validating NEq and observing that the neuronal equilibrium depends on the specific learning setup.


Emerging Paradigms of Neural Network Pruning

arXiv.org Artificial Intelligence

Over-parameterization of neural networks benefits the optimization and generalization yet brings cost in practice. Pruning is adopted as a post-processing solution to this problem, which aims to remove unnecessary parameters in a neural network with little performance compromised. It has been broadly believed the resulted sparse neural network cannot be trained from scratch to comparable accuracy. However, several recent works (e.g., [Frankle and Carbin, 2019a]) challenge this belief by discovering random sparse networks which can be trained to match the performance with their dense counterpart. This new pruning paradigm later inspires more new methods of pruning at initialization. In spite of the encouraging progress, how to coordinate these new pruning fashions with the traditional pruning has not been explored yet. This survey seeks to bridge the gap by proposing a general pruning framework so that the emerging pruning paradigms can be accommodated well with the traditional one. With it, we systematically reflect the major differences and new insights brought by these new pruning fashions, with representative works discussed at length. Finally, we summarize the open questions as worthy future directions.


Lottery Tickets in Linear Models: An Analysis of Iterative Magnitude Pruning

arXiv.org Machine Learning

The lottery ticket hypothesis [Frankle and Carbin, 2019] asserts that a randomly initialised, densely connected feed-forward neural network contains a sparse sub-network that, when trained in isolation, attains equal or higher accuracy than the full network. The method used to find these sub-networks is iterative magnitude pruning (IMP). A network is given a random initialisation, trained by some form of gradient descent for a specified number of iterations and a proportion of its smallest weights (by absolute magnitude) are deleted. The remaining weights are then reset to their initialised values and the network is retrained. This procedure can be performed multiple times, resulting in a sequence of sparse yet trainable sub-networks.


Sparse Transfer Learning via Winning Lottery Tickets

arXiv.org Machine Learning

The recently proposed Lottery Ticket Hypothesis of Frankle and Carbin (2019) suggests that the performance of over-parameterized deep networks is due to the random initialization seeding the network with a small fraction of favorable weights. These weights retain their dominant status throughout training -- in a very real sense, this sub-network "won the lottery" during initialization. The authors find sub-networks via unstructured magnitude pruning with 85-95% of parameters removed that train to the same accuracy as the original network at a similar speed, which they call winning tickets. In this paper, we extend the Lottery Ticket Hypothesis to a variety of transfer learning tasks. We show that sparse sub-networks with approximately 90-95% of weights removed achieve (and often exceed) the accuracy of the original dense network in several realistic settings. We experimentally validate this by transferring the sparse representation found via pruning on CIFAR-10 to SmallNORB and FashionMNIST for object recognition tasks.